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The Brown-Resnick max-stable process has proven to be well- 
suited for modeling extremes of complex environmental processes, 
but in many applications its likelihood function is intractable and 
inference must be based on a composite likelihood, thereby prevent¬ 
ing the use of classical Bayesian techniques. In this paper we exploit 
a case in which the full likelihood of a Brown-Resnick process can 
be calculated, using componentwise maxima and their partitions in 
terms of individual events, and we propose two new approaches to 
inference. The hrst estimates the partitions using declustering, while 
the second uses random partitions in a Markov chain Monte Carlo 
algorithm. We use these approaches to construct a Bayesian hierar¬ 
chical model for extreme low temperatures in northern Fennoscandia. 


1. Introduction. Extreme events such as heat or cold waves have ma¬ 
jor impacts on human health and ecological sustainability. Motivated by the 
pressing need for better risk assessment, inference for spatial extremes has 
developed rapidly during the last decade (e.g., de Haan and Ferreira, 2006; 
Davison, Padoan and Ribatet, 2012; Davison, Huser and Thibaud, 2013; 
Ribatet, 2013). Max-stable processes are the only possible non-degenerate 
limits for rescaled pointwise maxima of random processes, thus motivating 
their use in modeling spatial extremes. 

Flexible max-stable models have been developed for inference on spatial 
extremes. Over a variety of applications, the Brown-Resnick process (Brown 
and Resnick, 1977; Kabluchko, Schlather and de Haan, 2009) has proven to 
be a valuable model for extremes of environmental variables, such as rain¬ 
fall, temperature, wind speed and river discharge (see, e.g., Davison, Huser 
and Thibaud, 2013; Engelke et ah, 2015; Asadi, Davison and Engelke, 2015; 
Buhl and Kliippelberg, 2016). It is built using Gaussian processes and its 
dependence structure is determined by a variogram, so well-established ideas 
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from classical geostatistics may be exported to the extremal context. The 
greater challenge is inference, since the full likelihood of the Brown-Resnick 
process is tractable only in relatively low-dimensional settings (Huser and 
Davison, 2013; Castruccio, Huser and Genton, 2015). Composite likelihood 
techniques have been successfully used to fit max-stable models to block 
maxima or threshold exceedances, but with reduced estimation efficiency. 
More efficient procedures based on full, possibly censored, likelihoods (En- 
gelke et ah, 2015; Wadsworth and Tawn, 2014; Thibaud and Opitz, 2015) 
have been developed for threshold exceedances, allowing the construction of 
more elaborate models for spatial extremes. 

Bayesian hierarchical models are appealing in spatial statistics because of 
the flexibility with which they can track the spatial variation of marginal dis¬ 
tributions. Casson and Coles (1999), Cooley, Nychka and Naveau (2007) and 
Sang and Gelfand (2009) constructed hierarchical models for extremes as¬ 
suming conditional independence of extreme occurrences. Such models may 
have flexible margins but cannot produce realistic simulations of extreme 
events (Davison, Padoan and Ribatet, 2012). Sang and Gelfand (2010) and 
Fuentes, Henry and Reich (2013) used hierarchical models that model spatial 
dependence using Gaussian processes, a somewhat restrictive class of models 
that are asymptotically independent and thus may underestimate potential 
dependence at the most extreme levels (Davison, Huser and Thibaud, 2013). 
Ribatet, Cooley and Davison (2012) and Shaby (2014) proposed the use of 
composite likelihoods in Markov chain Monte Carlo (MCMC) algorithms 
and applied them for Bayesian inference on max-stable models, but these 
methods present computational challenges and do not yield exact simula¬ 
tion from the posterior distribution. Shaby and Reich (2012) constructed a 
max-stable spatial model based on a hierarchical representation of logistic 
extreme-value distributions (Stephenson, 2009). 

In this paper we propose a Bayesian hierarchical model based on the 
Brown-Resnick process, which exploits a special case for which the full like¬ 
lihood of such a process can be calculated, using the joint distribution of 
componentwise maxima and their partitions in terms of individual events 
(Stephenson and Tawn, 2005). We discuss the difficulties of using the parti¬ 
tions in practice, and we propose a new Bayesian approach to fitting max- 
stable processes without fixed partitions. 

Our work is motivated by the changing climate of northern Fennoscan- 
dia. In northern regions of Norway, Finland, Russia and Sweden, extreme 
low temperatures play a central role in ecological sustainability by limiting 
the extent of outbreaks of forest pests such as the autumnal moth Epirrita 
autumnata, whose larvae have caused serious damage by defoliating moun- 
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tain birch. Forests in the region have mostly been protected from this pest 
by extreme low winter temperatures, since its eggs cannot survive below 
—36°C (Virtanen, Neuvonen and Nikula, 1998). Winter minimum tempera¬ 
tures are often below this threshold, but in recent years they have increased 
and projections suggest they will increase even more in the future (Stocker 
et ah, 2013, §E.l). Although temperature is not the only factor limiting the 
spread of the moth, it is one of the most important, and thus plays a central 
role in the ecology of northern Fennoscandia. In §4.5 we discuss the possi¬ 
ble changes in winter minimum temperatures in the region, focusing on the 
probability that they will be less than —36°C. 

Motivated by recent disastrous heat wave events over the world, max- 
stable processes and related methods have been used to model temperature 
extremes and their potential changes in time. Davison and Gholamrezaee 
(2012) used the so-called Schlather (2002) process to model annual maximum 
temperatures in Switzerland. They used latitude, longitude and elevation to 
estimate response surfaces for the marginal parameters of extremes. Fuentes, 
Henry and Reich (2013) describe a Bayesian spatial model for annual max¬ 
imum temperatures in the US based on mixtures of Gaussian distributions. 
Shaby and Reich (2012) used a Bayesian max-stable model to model extreme 
temperature data over Europe. 

Section 2 describes the data we study. Section 3 summarizes some extreme 
value ideas and introduces the inference methods for the Brown-Resnick 
model. The hierarchical model and the application are discussed in Section 4. 

2. Extreme temperatures in northern Fennoscandia. 

2.1. Dataset. We use temperature data from 20 meteorological stations 
located in northern regions of Einland, Norway and Sweden; see Fig. 1. 
The smallest and largest distances between two stations are 26 and 471 km. 
Daily minimum temperatures are available from 1960 to 2013, but with data 
missing. In view of the motivation for our application, inference is needed 
only for the winter minima. We focus on modeling minimum temperatures 
in the months December-March and we use the daily data to investigate 
their co-occurrences. 

Missing daily data censor the winter minima. Winter minimum values 
for the 20 stations are shown in Fig. 2. Our dataset has at least 26 winter 
minima for each station, and each year has values of winter minimum for at 
least 15 of the 20 stations. 

Mean winter minimum temperatures depend on local geography and to¬ 
pography, and we shall model this non-stationarity in their marginal dis¬ 
tributions using covariates and random effects, which will also allow us to 
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Eig 1. Locations of the 20 weather stations in the northern regions of Finland, Norway 
and Sweden. The dashed lines shows the convex hull of the stations, which will be our 
region of interest for simulation. 

extrapolate results to ungauged sites. 

We derived covariates that are known to influence minimum temperature 
in northern Fennoscandia (Aalto, le Roux and Luoto, 2014). We supple¬ 
mented the coordinates of the stations with their absolute elevation above 
sea level and elevation relative to the surrounding areas (calculated on a 
2 km X 2 km spatial moving average), a proximity index to the Arctic 
Ocean, calculated with a 160 km x 160 km moving window assuming that 
the effect of the ocean reaches 80 km inland, and a lake cover index, which 
expresses the proportion of area covered by lakes in a 3 km x 3 km window 
(see Aalto, le Roux and Luoto, 2014). 

2.2. Temporal stationarity. In Fig. 2 it is difficult to see the presence of 
a temporal trend with only 53 years of noisy data, but the figure suggests 
that minima may generally be slightly less extreme from 1990 onwards. The 
inclusion of a temporal effect in the model will help to assess this. 

2.3. Serial dependence. Since we focus on modeling the winter minima, 
temporal dependence in the daily temperatures is not of interest in itself. 
But our approach requires us to partition minima in terms of independent 
generating events, which is complicated by the presence of temporal de¬ 
pendence in the daily data; see §4.2.3. Cold waves stay over the region for 
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Fig 2. Winter minimum temperature (° C) 20 stations (a different color for each 

station). The horizontal line shows the —36°C threshold. 


several days, so minimum temperatures tend to be very dependent from 
day to day. Preliminary analysis (see the Supplementary Material, Thibaud 
et ah, 2016) showed that the dependence is strong for the first few days and 
vanishes after approximately five to ten days at each location. 

2.4. Spatial dependence. We expect winter minimum temperatures to be 
dependent over large regions, and this is the case in our data. Fig. 2 suggests 
that the minima for the 20 stations behave similarly for the 53 years: the 
years 1966 and 1999 were very cold at all 20 locations, whereas 1973, 1993 
and 2007 are warmer at all locations. It appears that most of the minima 
occurred on the same day or only a few days apart, so they may correspond 
to the same event, which suggests strong spatial dependence. 

Results from a preliminary analysis (see Thibaud et ah, 2016) suggest 
that pairwise distributions are asymptotically dependent and max-stable. 
After accounting for differences in the marginal distributions, we will use a 
stationary isotropic max-stable model for modeling spatial dependence. 

3. Max-stable processes and inference for spatial extremes. 

3.1. Generalities. Let Y = {y(s)}sg 5 be a random process with contin¬ 
uous sample paths defined over a compact subset S of and let Yi, 12, • • • 
be independent replicates of Y. If there exist sequences of continuous func¬ 
tions {a„(s)}sg 5 ' > 0 and {bn{s)}s£S such that we have the distributional 
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convergence 

{ a^{s)-^ [max{yi(s),..., y^s)} - 6„(s)] I ^ ^ {Z{s)} seS, n^oo, 

J sGS 

where Z = {y(s)}sg 5 is a non-degenerate random process, we say that Y 
lies in the max-domain of attraction of the process Z. Then the process Z 
mnst be max-stable (de Haan and Ferreira, 2006) and its marginals are of 
generalized extreme-value (GEV) form, 

Pr{y(s) <z} = exp (^- [1 -h C(s)o-(s)“H^ “ , s e S, 

denoted Z{s) ~ GEV{/i(s),(T(s),^(s)}, with x+ = max(0,x), and real loca¬ 
tion /i(s), scale a{s) > 0 and shape ^(s) parameters; throughout the paper, 
the case ^(s) = 0 is defined as the limit for ^(s) —>■ 0. Every max-stable 
process with marginal parameters //(s), 17 ( 5 ) and ^(s) can be written as 
fi{s) + a{s)^{s)~^{Z— 1} where Z is a simple max-stable process 
with unit Frechet margins, i.e., fi{s) = (t(s) = ^(s) = 1. Thus it suffices to 
consider simple max-stable processes, henceforth denoted by Z. 

As a consequence of max-stability, for every set of locations si,..., s/) G S, 

(1) Pr{Z(si) < zi,..., Z{sd) < zd} = exp{-y( 2 i,.. .,zd)}, 

Zi, . . . ,ZD > 0, 

where the so-called exponent function V is homogeneous of order —1, i.e., 
V{tzi ,..., tzo) = t~^V{zi,..., Zd), t > 0, and satisfies V{z,oo,..., 00 ) = 
z~^ for any permutation of its arguments. In particular, 

Pr{Z(si) < z,..., Z{sd) < z} = exp(—d/z), z > 0, 

where 9 = y(l,... ,1) G [^,D] is known as the extremal coefficient of Z 
relative to si,..., sd- The extremal coefficient can be interpreted as the 
effective number of independent variables among Z{si),..., Z{sd) and is 
useful for model checking, since it can be estimated non-parametrically. 

As they appear as limits for rescaled pointwise maxima of random pro¬ 
cesses, max-stable processes are natural models for observed block maxima. 
Moreover, the equality min{yi(s),..., y„(s)} = — max{—yi(s),..., —y„(s)} 
shows that such processes also provide natural models for block minima. 

Several parametric max-stable models have been proposed. The Smith 
(1990) process was the first to be considered but its realizations are too 
smooth for realistic modeling of complex extremes. The Schlather (2002) 
process is constructed from stationary Gaussian processes and provides 


BAYESIAN INFERENCE FOR THE BROWN-RESNICK PROCESS 


7 


more realistic realizations controlled by different correlation functions. The 
extremal-t process extends the Schlather process by introducing an addi¬ 
tional shape parameter (Opitz, 2013). Here we focus on the Brown-Resnick 
process (Brown and Resnick, 1977; Kabluchko, Schlather and de Haan, 2009) 
which appears to be a good model for extremes of environmental processes. 

3.2. Brown-Resnick process. Kabluchko, Schlather and de Haan (2009) 
constructed a stationary max-stable process, called the Brown-Resnick pro¬ 
cess, as 

(2) Z{s) = max Q'^^Wi{s), Wi{s) = exp{ei(s) - ct(s)^/2}, s € 5, 

2 = 1 , 2 ,... 

where the Qi's are the points of a unit rate Poisson process on (0, oo) and the 
Ei are independent replicates of an intrinsically stationary centered Gaussian 
process e with variance function (7(s)^ = E{e(s)^}. The dependence struc¬ 
ture of Z is inherited from that of e, which is controlled by the variogram 
2'y{h) = E[{e(s + h) — e(s)}^]. Without loss of generality, we suppose that 
e(0) = 0 with probability one, so cr{s)‘^ = 2^(s). A large class of models 
may be obtained by choosing different variograms. Later we use the stable 
variogram 2j(h) = 2(||h||/A)'^, with A > 0 and k G (0,2]. 

The exponent function of a Brown-Resnick process is (Nikoloulopoulos, 
Joe and Li, 2009) 


(3) V(zi,...,zd) = 

D 

1=1 

where $p('; /x, fi) is the cumulative distribution function of a p-dimensional 
Gaussian distribution with mean vector fi and covariance matrix fi, = 

(1,..., 1)' G r = { 7(^11 -•si2)}i<lij2<T'> denotes the {D-l) x 1 

vector consisting of the jth column of E without its jth component, T_j_j 
denotes the matrix E without its jth row and jth column, and so forth. 

Let Z = {Z(si),..., Z{sr>)} be a vector stemming from a simple max- 
stable process Z. Differentiation of (1) shows that the full likelihood contri¬ 
bution for one realization {zi, ..., zd) of Z may be written as 


( 4 ) 


E 


r K 


.k=l 


exp{-K(zi,.. .,zd)}, 


where V denotes the set of all partitions of {1,... ,iA} and the subscript 
TTfc on denotes the partial derivative of the function V with respect to 
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the components in the subset vr^. Although the likelihood has a closed-form 
expression for the Brown-Resnick process (Wadsworth and Tawn, 2014), 
the size of V equals the Bell number and is larger than 10^ for D > 10, 
making full likelihood inference computationally unattainable in many sit¬ 
uations. Hence inference for max-stable processes has typically been based 
on composite likelihoods using lower-order densities (Padoan, Ribatet and 
Sisson, 2010; Castruccio, Huser and Genton, 2015). Not only do compos¬ 
ite likelihoods result in a loss of efficiency compared to full likelihood esti¬ 
mation, they also increase the computational burden when employed in a 
Bayesian analysis, because the composite likelihood does not directly yield a 
posterior which appropriately accounts for parameter uncertainty (Ribatet, 
Cooley and Davison, 2012; Shaby, 2014). Engelke et al. (2015), Wadsworth 
and Tawn (2014) and Thibaud and Opitz (2015) developed full likelihood 
inferences for threshold exceedances of processes in the max-domain of at¬ 
traction of a Brown-Resnick process, which, while challenging, have lower 
computational burdens than does (4). 

For our likelihood, we employ the result of Wadsworth and Tawn (2014), 
who showed that, for the Brown-Resnick process, partial derivatives of V 
with respect to the first d indices, denoted Vi-d, satisfy 


(5) -VU^) = 


^D-d 






X exp 




1 

4 r^qd I'^Qd 


1 

^'d(id 


xexp 



jlog z[.dAi,d log zi,d + log z[.^ 






(idq'd^d \' 

^'d^d jJ ’ 


withf = (K'lAR:ol)-^T = -f(K'iAR:oi log zi,rf+K'iI]-Az,/l'^I]-Az,), 
S = {7(sji) -k - 7(sji - Sj2)}i<iij2<£'> ^i.d the dx d matrix con¬ 

sisting of the first d rows and columns of S, A = 5 ]“^ — gg'/l^q, Ai-d = 

- qdq'd/l'dqd, q = qd = S^r^ld, O- = diag(F:) and ad = ai,d, 

where 


Kio = 


^D-d,d J 


Koi = 


Od,D-d ) 

iD-d J 


3.3. The Stephenson-Tawn model. Stephenson and Tawn (2005) derived 
a likelihood based on componentwise block maxima and their occurrence 
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times that avoids direct differentiation of (1). Let denote in¬ 

dependent replicates of a random vector Y = {y(si), ..., y(s£))} € 
Without loss of generality, suppose that the process Y has unit Prechet 
margins and lies in the max-domain of attraction of a simple max-stable 
process Z. Let Mn = maxj=i^,,,^„ 1^/n denote the componentwise maxima, 
and let !!„ = (tti, ... , ttk) denote the unique partition of {1 ,... , D} such 
that {Mnj}j£Trk comes from the same individual observation, i.e., for each 
k e {1,... ,K}, there exists ik such that {Mnjjje-Kk = Then 

Stephenson and Tawn (2005) showed that, as re —>• oo, the block maximum 
and its associated partition (iVf„,n„) converge in distribution to (Z,If), 
whose joint density is 

(6) /(Z,n) = exp{-Y(Z)} H {-K,(Z)}. 

This suggests using (6) as a likelihood for inference on the joint distribution 
of block maxima and their occurrence times. 

This approach only requires the maximum and the partition to be ob¬ 
served, and thus represents a compromise between the block maximum 
and threshold exceedance approaches. The relative efficiency of Stephenson- 
Tawn and pairwise maximum likelihood estimators has been investigated by 
Wadsworth and Tawn (2014). Because the partition contains information 
about the dependence, the Stephenson-Tawn estimator has smaller variance 
than that based only on the block maxima. However, for finite block length, 
the former may be more biased, because the model {Z, H) is misspecified for 
i-c., both for the maximum and the partition (Wadsworth, 2015). 
Use of the limiting distribution for {Mn, H^) represents a strong assumption 
that can lead to poor fit, whereas approaches based on the maximum 
alone are more robust. 

A further difficulty arises from the definition of the partition for serially 
dependent data. The partition in the Stephenson-Tawn model is defined in 
terms of independent generating events. While it is natural to define the 
partition in terms of the days of occurrence of the maxima when the daily 
observations are independent, its estimation is harder when data show tem¬ 
poral dependence. In §4.2.3 we propose a space-time declustering procedure 
to estimate the partition. 

3.4. Random partition model. Although the use of the estimated fixed 
partition is theoretically appealing, it represents a practical challenge. First, 
the model is misspecified, which affects the estimation of the max-stable 
model by the introduction of potentially biased information. Second, the 
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partition must be estimated in a preliminary step, and inference based on 
the Stephenson-Tawn model is conditional on the fixed partition and ignores 
the uncertainty associated to its estimation. The partition is not always very 
well defined (see §4.2.3) and its estimation is sensitive to measurement error. 
This motivates us to propose an approach to estimating the parameters of 
the Brown-Resnick process, without hxing the partition, which uses only 
the values of the maxima and can be applied when the occurrence dates are 
unavailable. 

Let Z = {Z(si),..., Z{s£))} be a vector stemming from a Brown-Resnick 
process Z. To overcome the intractability of the likelihood (4), we propose to 
augment the data with the unknown partition B that generates Z in terms 
of the functions Pi = Q~^Wi{s) in (2), i.e., B = (tti, ..., ttk) is dehned such 
that the points Z{sj) in the same set come from the same function Pi. 
The joint distribution of {Z,I\) is (6). 

We impute the unknown partition B given the observed value of Z in an 
MCMC algorithm, using ideas similar to Dombry, Eyi-Minko and Ribatet 
(2013), who considered the partition in the context of conditional simulation 
from max-stable processes. The partition is imputed from its conditional 
distribution given the data. From (6) we have 


(7) 


Pr(B = TT \ Z = z) 


f{z,TT) 


Bowever, the number of partitions of the set {1,...,I1} is the D-th. Bell 
number, so (7) is intractable for large D. We use a Gibbs sampler to update 
the partition. For a partition vr = (tti, ... ,'itk)^ we choose a location Sj at 
random and let 7r_j denote the restriction of vr to {si,..., sd} \ {sj}. The 
size of TT-j is K, or {K — 1) if {sj} is itself a partitioning set of vr. Given tt-j, 
the partition is updated by attributing the location sj either to an existing 
set of 7r_j or to a new set, giving at most K + 1 choices. The conditional 
probability for a new partition vr* with vr* = 7r_j is 


( 8 ) 


Pr(B = TT* I B_j = 7 r_j, Z = z) 




The discrete distribution (8) has support on a set of size at most K + 1, so 
a new partition vr* can easily be simulated. 

In practice, inference is based on independent replicates Zi,..., Zn, and 
the MGMC algorithm independently updates the corresponding latent parti¬ 
tions Bi,..., Bat. Given the observed values of the Zi and the simulated Bj, 
the marginal and dependence parameters of the model are updated from the 
joint distribution of the (.Zj, Bj), which is a product of terms of the form (6). 
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4. Modeling extreme temperatures in northern Fennoscandia. 

4.1. Exploratory analysis. We first fitted GEV distributions separately 
at each of the 20 stations; see Thibaud et al. (2016). The estimates for 
the scale and shape parameters were not significantly different from site to 
site, but variation in the location parameter estimates suggested use of a 
non-stationary GEV model with spatially varying location parameter and 
constant scale and shape parameters. 

We then fitted non-stationary marginal models at the 20 locations by 
maximizing the independence likelihood, which assumes the independence 
of the temperatures from site to site. Non-stationarity was modeled using 
trend surfaces defined from the covariates mentioned in §2.1 and temporal 
trends in the location and scale parameters. We used the R package Spa- 
tialExtremes (Ribatet, 2015) to ht these models, which were compared using 
the Takeuchi information criterion, an equivalent of the Akaike information 
criterion for misspecified models. Our best model had all six covariates for 
the location surface, constant scale and shape parameters, and a temporal 
trend in the location only. Its fit was assessed using marginal QQ-plots; de¬ 
parture from the 95% confidence intervals was observed for five stations. 
This suggests that the trend surface model is not flexible enough to model 
the variability in the location parameters over the 20 stations. Inclusion of 
additional covariates did not improve the fit, motivating the use of a more 
flexible approach using random effects in a Bayesian hierarchical model. 

4.2. Hierarchical model. 

4.2.1. Definition. We consider two Bayesian hierarchical models based 
on the Brown-Resnick process. The first uses the Stephenson-Tawn model 
and estimates the partitions in a preliminary step through space-time declus¬ 
tering; see §§3.3, 4.2.3. The second imputes the partitions in the MGMC 
algorithm; see §3.4. Here we describe the part of the hierarchical model for 
the data conditional on the partitions, which is common to both approaches. 

Let yi = {yn, ..., yio) (i = 1,..., N) denote the observed minimum tem¬ 
peratures for the N years of data, let si,..., sd denote the D = 2t) locations 
of the stations such that yij = yfisfi, let tij £ M denote the covariate for 
the day of occurrence of the winter minimum y^j, dehned as the number of 
years from 1 January 2000, and hnally let tt* = (TTji,..., denote the 
partitions of {1,... ,D} for each year. When data are missing, the dimen¬ 
sion of yi is lower than D and so is the size of the corresponding partition. 
The Hrst level of the hierarchical model gives the probability density of the 
observed minima and the partitions tti, ..., ttw, conditional on the marginal 
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and dependence parameters -0, as 
(9) L({yi},{7ri} I 0) = 

N / Ki D 

exp[-V{fii{-yii),..., fw{-yiD)}] [-K-iJ/ji(-yii), • • ■ JiDi-yiD)}]Y[ flji-Vij) 

i=i \ k=i j=i 

where 

fijiy) = 

denotes the marginal transformation to the standard Frechet scale for ^ 0 
and the prime denotes derivative with respect to y; the negative signs for 
the yij’s in (9) correspond to modeling negated winter minima. 

The second level of the hierarchical model specihes the models for the 
exponent measnre V and the GEV parameters. We dehne a space-time model 
for /j, with the covariates discussed in §2.1, with a linear temporal trend, and 
with a Gaussian prior distribution as 

/i(s, t) ~ GP{X(s)/3, T^pj + at 

where W(s) denotes the vector of the intercept and covariates at location s, 
and GP{x(s), T^/o(-)} denotes a Gaussian process with mean x{s), variance 
T^, and correlation function p{h) = exp(— ||h||/5). The random effect allows 
the location parameters to vary smoothly over space, in addition to any effect 
of the covariates. We kept a and ^ constant over space and time because 
this appeared to suffice for a good ht. Thus pij = p{sj,tij), aij = a, and 
iij ~ ? each pair of replicate and location indices (i, j), and the marginal 
GEV distributions of the model at the observed data can be written as 

-Vij ~ GEV {Uj + atij,a, , 

where U = {Ui,... ,Ud) is a Gaussian vector with mean Xj3, where X 
is a X 7 matrix with rows X{sj), and covariance matrix {r'^p^Sj^ — 

We use a Brown-Resnick process with variogram 27 (h) = 

2(||h||/A)'*^ as a model for the exponent function V. 

4.2.2. Prior distributions. Prior distributions are assigned to the vari¬ 
ables at the last level of a hierarchical model. We used vague priors for 
all parameters except those of the latent Gaussian held; see Thibaud et al. 

(2016). The prior distributions for /3 and were chosen to obtain conjugate 
full conditional distributions for these parameters. 
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4.2.3. Space-time declustering. The use of the Stephenson-Tawn model 
involves choosing the partitions in (9), whose sets should correspond to inde¬ 
pendent individual events. Several approaches have been proposed to iden¬ 
tify clusters in univariate time series (e.g., Beirlant et ah, 2004, Chapter 10; 
Chavez-Demoulin and Davison, 2012). In the multivariate case, declustering 
techniques have been proposed by Coles and Tawn (1991) and Nadarajah 
(2001). Here we follow Coles and Tawn (1991), who identified clusters of 
storm surges by assuming that events separated by a fixed time lag are 
independent. 

In our application, a difficulty arises from the use of the dates of occur¬ 
rences. Because of the measurement precision of the temperatures (0.1°C), 
the same annual minimum may occur on different days, affecting the esti¬ 
mation of the partition; this is the case for 7% of the minima, with dates 
of occurrences separated by more than two days in 3% of cases. When a 
minimum has multiple days of occurrences, we choose one of these days at 
random to define our partition. 

For our temperature data, preliminary analysis suggests choosing a time 
lag of five days: two observations at the same location and separated by more 
than five days are assumed to be independent. To identify clusters for the 20 
time series, we use a single linkage (or friend-to-friend) method whereby a 
cluster is composed of data that are dependent in pairs. Thus if the minima 
occurred at time 1 for the first series, time 4 for the second and time 7 for 
the third, these three minima are considered to lie in the same cluster. With 
this procedure, the size of the partition for each year is at most five, and 
winter temperature minima usually occurred from widespread events that 
lasted for between five and ten days. 

4.3. MCMC algorithm. The MCMC algorithm for our Brown-Resnick 
model is nonstandard because the likelihood (9) cannot be calculated exactly 
but must be estimated. The likelihood involves the function V and its partial 
derivatives, which themselves involve the Gaussian cumulative distribution 
function in their expressions, and thus can only be estimated. We used the 
approach of Andrieu and Roberts (2009) to use a likelihood estimator in 
the MCMC algorithm. Under the assumption that the likelihood estimator 
is unbiased, that algorithm generates a Markov chain that has the targeted 
posterior distribution as stationary distribution. 

We estimated the likelihood (9) by replacing the Gaussian cumulative 
distribution functions in its expression by Monte Carlo estimators. Our es¬ 
timator is not unbiased, but we constructed the estimator to minimize the 
potential bias in the estimation of the posterior and obtain good mixing 
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properties for the Markov chain in a reasonable time; see Thibaud et al. 
(2016). For our dataset and the fixed partitions estimated in §4.2.3, a single 
estimation of the likelihood (9) takes about one second on a single CPU. 

We used a Gibbs sampler with Metropolis-Hastings steps to update the 
marginal and dependence parameters of the model. To minimize the number 
of likelihood estimations in the MCMC algorithm, and because some param¬ 
eters were found to be very dependent, we updated the parameters of the 
model in blocks; see Thibaud et al. (2016). With our choice of blocks, one 
iteration of the MCMC algorithm requires four likelihood estimations for 
the Stephenson-Tawn model. For the random partition model, the Gibbs 
sampler for the partitions requires estimating the probabilities (8), which 
increases the computational burden a bit more. 

Since one iteration of the MCMC algorithm takes several seconds on a 
single CPU, we used parallel computing, fitting 50 independent Markov 
chains each of length 15,000 and discarding the first 5000 iterations of each 
chain. The resulting chains were merged to obtain a final chain of length 
500,000. The MCMC diagnostics are discussed in Thibaud et al. (2016). 
The mean of the posterior distributions were used as pointwise estimators. 
Credible intervals were calculated from the quantiles of the Markov chains. 

The results of a simulation study suggest that the MCMC algorithm pro¬ 
vides sensible inferences on the model parameters: the posterior means were 
close to the true values and the empirical coverages of the credible intervals 
were close to the nominal values; see Thibaud et al. (2016). 

4.4. Results. We compare three non-stationary Brown-Resnick models. 
The first. Ml, uses the same trend surfaces as does the best marginal model 
discussed in §4.1 and is fitted using pairwise likelihood as implemented in 
the R package SpatialExtremes; this is our baseline model, to which we 
compare our new approaches. The second, M2, is the Bayesian hierarchical 
model that uses the fixed partitions from the declustering procedure. The 
third, M3, is the Bayesian hierarchical model with the random partitions. 

We discuss the estimates in terms of the negated location parameter /x, 
and negated trend a, which correspond to the original scale of minimum 
temperatures. The estimates for the three models are given in Table 1. We 
assessed the fit of the marginal distributions using QQ-plots (see Thibaud 
et al., 2016). We assessed the fit of the dependence model in terms of empir¬ 
ical pairwise extremal coefficients, computed using the empirical marginal 
distributions of the data to avoid potential bias due to the estimation of 
the marginals (Ribatet, 2013); see Fig. 3. As a global assessment of the fit 
in terms of the marginals and dependence, we computed the maximum of 
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Table 1 

Estimated posterior mean parameters and 95% confidence/credible intervals (in 
parentheses) from the trend surface model Ml, the Bayesian model based on declustering 
M2 and the random partition Bayesian model M3. The second column is the estimated 
average location parameter yij = ij,{sj,0) over the 20 stations. 


Model 

Tj 

—a 

(7 


A 

k 

Ml 

-34.9 

0.07 

3.4 

-0.15 

371 

0.40 

( 

-35.8, -34.1) 

(0.03, 0.11) 

(3.0, 3.7) 

(-0.19, -0.10) 

(40,702) 

(0.34, 0.47) 

M2 

-34.6 

0.09 

4.0 

-0.11 

1811 

0.54 

( 

-35.6, -33.7) 

(0.05, 0.13) 

(3.6, 4.6) 

(-0.15, -0.06) 

(977,3615) 

(0.46, 0.63) 

M3 

-35.0 

0.06 

3.5 

-0.10 

1086 

0.53 

( 

-36.0, -33.8) 

(0.00, 0.11) 

(3.0,4.!) 

o 

d 

1 

d 

(466,3143) 

(0.43, 0.65) 



Fig 3. Binned estimates of empirical pairwise extremal coefficients (black, with their 95% 
confidence intervals in gray). Curves are from the fitted Brown-Resnick models (see Ta¬ 
ble 1): Ml (dotted), M2 (dashed), and M3 (plain). 


winter minimum temperatures over four different sets of stations, for the 
53 years of the dataset, and compared the observed values with the predic¬ 
tions from the three Brown-Resnick models, computed by simulation using 
the algorithm of Dieker and Mikosch (2015); see Fig. 4. Further diagnos¬ 
tics for the prediction of the minimum and the averages of winter minimum 
temperatures are given in Thibaud et al. (2016). 

For the trend surface model, Ml, the marginal QQ-plots show poor fits for 
six stations. Moreover, the model severely underestimates the dependence 
in terms of the pairwise extremal coefficient. This can be explained by the 
inadequacy of the fitted marginal distributions, which bias the dependence 
estimates; Ribatet (2013) also observed this when modeling wind gust data. 
In terms of predicting the maxima over groups of stations, the model per- 
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Fig 4. Model checking for the three Brown-Resnick models (see Table 1): Ml (top row), 
M2 (middle row) and MS (bottom row). The QQ-plots compare the predictions and the 
observed maximum of winter minimum temperatures (° C) over some groups of stations, 
mi = maxjgj yij ("i = 1,..., 53j. The columns, from left to right, correspond to the group 
of stations J = {5,16,17,18,19} (western region), J = {2,3,4,6,7,8,9,12,20} (eastern 
region), J = {5, 6, 8, 9,10,11, 12,13,14, 15,19, 20} (northern region), and all 20 stations. 
Grey dashed lines are 95% overall confidence bands, and the solid grey line indicates a 
perfect fit. 


forms poorly: many points lie outside the 95% confidence bounds. In general, 
this model does not predict extreme temperatures well. 

For the Bayesian model based on the declustering, M2, the marginal QQ- 
plots show that the sample quantiles are outside the 95% conhdence bands 
for three stations, and underestimate them systematically at all stations; 
the estimated scale parameter a seems too large. The model overestimates 
the pairwise dependence somewhat, and tends to overestimate return levels 
for the maxima over different groups of stations. 

For the random partition model, M3, the marginal QQ-plots show a good 
ht of the GEV distributions: the empirical quantiles are in the 95% confi¬ 
dence bands for all the stations. The empirical pairwise extremal coefficients 
agree adequately with the curve from the model. In terms of the multivari¬ 
ate predictions, M3 can predict the maximum, minimum and average of 
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Fig 5. Posterior mean of the random effect in the location parameter (a) and mean of the 
GEV distribution for the year 2016 (b). The scale corresponds to temperature minima (° C). 


the minima better than does M2, with all the observations within the 95% 
confidence bounds; see Fig. 4 and Thibaud et al. (2016). This model ap¬ 
pears to represent both the marginal and spatial behavior of extreme low 
temperatures adequately. 

We further discuss the results for the random partition Bayesian model 
M3. Fig. 5(a) shows the posterior mean of the random effects for the lo¬ 
cation parameter over the convex hull of the meteorological stations (see 
Fig. 1), and the mean of the GEV distribution for the year 2016. We restrict 
our analysis to the convex hull of the stations because extrapolation out¬ 
side this region is too uncertain. The posterior mean of the random effect 
in the location parameters shows the variability in this parameter that can¬ 
not be explained by the covariates. The posterior mean for the fitted GEV 
distributions shows lower temperatures in the inland region. The minima de¬ 
crease as latitude, longitude, relative elevation and distance from the Arctic 
ocean increase, and increase as lake cover and elevation increase. Similar 
covariate effects were found by Aalto, le Roux and Luoto (2014) for extreme 
low temperatures in northern Fennoscandia. The estimated shape param¬ 
eter, corresponding to Weibull—so-called light-tailed—distributions, agrees 
with previous studies on extreme temperatures. The dependence param¬ 
eters correspond to strong dependence, with pairwise extremal coefficient 
9 = 1.42 (1.34,1.50) at 400 km, while k = 0.53 (0.43,0.65) implies that the 
realizations from the Brown-Resnick process are continuous but nowhere 
differentiable. 

4.5. Forecasts for extreme low temperatures for 2016-2030. We base our 
forecasts on the random partition Bayesian model M3. Changes in extremes 
are entirely based on the linear trend a in the location parameter. This 
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1980 


2016 


2030 




- 0.6 
- 0.4 
■- 0.2 
L 0.0 


Fig 6. Probability that the winter minimum temperature is higher than — 36° C for 1980, 
2016 and 2030. 


translates to changes in the mean of the GEV distribution at location s and 
year t, which is 

fi{s, 0) + at + ar Hr(l - 0 - 1}, e < 1, e / 0. 

The estimate —a = 0.06 (0.00,0.11) corresponds to an increase of 0.6°C 
per decade in the mean winter minimum temperature. Fig. 5(b) shows the 
predicted mean winter minimum temperature for 2016. Fig. 6 shows the 
probability that winter minimum temperatures are higher than — 36°C, the 
threshold needed for protecting northern Fennoscandia forests from out¬ 
breaks of Epirrita autumnata, for 1980, 2016 and 2030. There is a strong 
change in these probabilities. Over the region defined by the stations, the 
winter minimum temperatures had low probability of exceeding — 36°C be¬ 
fore 1980, rising to about 0.54 for 2016, and then to 0.61 for 2030, according 
to the htted model. The predictions must be interpreted with care, owing 
to the large uncertainty in the estimation of a. 

Spatial simulation of winter minimum temperatures is useful for estimat¬ 
ing joint return levels over certain regions, which could be useful for assessing 
the risks of pest outbreaks over a region for example. Fig. 7 shows three sim¬ 
ulations of winter minima fields from M3, which show how the strong depen¬ 
dence in the Brown-Resnick model results in winter minima that are either 
low or high throughout the region. The spatial heterogeneity is mostly due 
to non-stationarity in the marginal distributions. Conditional simulations 
are possible (Dombry, Eyi-Minko and Ribatet, 2013) and could be useful as 
input to models to link past pest outbreaks with extreme temperatures, for 
example. 

5. Discussion. Although the Stephenson-Tawn approach was found 
to perform well on simulated data when the true partitions are known, the 






Fig 7. Three simulated minimum temperature fields C 2016. 


results of our application show that it can be inappropriate in practice. The 
hierarchical model for the marginals does not give a good fit to the marginal 
GEV distributions, notwithstanding the additional flexibility provided by 
the random effects. 

We explain this by the use of the joint model for the minima and the 
partitions, and the declustering. The first of these requires good agreement 
of the data with the distribution of the block maxima and the partitions, 
and this is a strong assumption in practice. Although the Brown-Resnick 
dependence structure is quite flexible for modeling stationary isotropic de¬ 
pendence structures, the distribution of the underlying partitions can be 
very different from the partitions obtained through the declustering. In the 
univariate case, previous studies have shown that declustering can bias es¬ 
timators of the marginal parameters if the clusters are poorly-defined, and 
we can expect similar problems in the multivariate case. In our application, 
assuming that data separated by more than five days are independent in 
pairs results in partitions of small sizes each year; see Fig. 8. A similar size 
for the true partitions for a Brown-Resnick process can only be obtained 
if the dependence in the process is very strong, and this is incompatible 
with the empirical extremal coefficients shown in Fig. 3. Fig. 8 also shows 
the sizes of the random partitions in the MCMC algorithm for model M3; 
in several cases the size of the partitions from the declustering procedure 
have low probability for the random partition model. To further compare 
the partitions from the MCMC algorithm with that from the declustering 
procedure, we computed the Rand index (Rand, 1971), which is, for each 
year, the proportion of pairs of locations that are simultaneously either to¬ 
gether or separate in both partitions; we then averaged the Rand index over 
the years. It varies from 0.52 to 0.63, indicating that pairwise classification 
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Fig 8. Estimated distributions of the partition sizes for each year. Grey boxes show the 
relative probability for each size (a wider box corresponds to a higher probability), estimated 
from the Gibbs sampler. The black dots show the sizes of the partitions obtained using the 
deelustering procedure discussed in §4.^.5. 


agrees for less than 63% of the pairs between M2 and M3. The size of the 
partitions is typically larger in the random partitions, and the relation with 
the occurrence dates is unclear. For example, in the winter 1973-1974, the 
winter minima at the 16 stations with no missing data occurred on four con¬ 
secutive days and the declustering procedure hnds a unique partitioning set, 
whereas just 7% of the partitions visited in the MCMC run have a unique 
set and 37% have four partitioning sets or more. 

These results suggest that a different declustering procedure must be used. 
We varied the number of days used in our declustering scheme, and also 
considered another scheme that forces the stations that are at least 150 km 
apart to be in different clusters. The parameter estimates changed with the 
scheme but the diagnostic plots were never very satisfactory. 

6. Conclusion. We have shown how Bayesian inference can be con¬ 
ducted for the Brown-Resnick process, either using the occurrence times of 
maxima and a declustering procedure, or by including the partitions in an 
MCMC algorithm. 

The partitions add powerful information into the model about the depen¬ 
dence structure. However our application shows that using fixed partitions 
can badly bias the estimation of the marginal and dependence parameters 
of a max-stable process. In a second approach we included the partitions in 
the MCMC algorithm without using information on the occurrence times, 
and this produced superior results. A more complex approach could use the 
occurrence times to construct a prior distribution on the space of the parti- 
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tions, to add more information into the model and improve the efficiency of 
the estimation, while accounting for the uncertainty on the partitions, but 
this might be rather complex. Wadsworth (2015) proposed using second- 
order terms in the Stephenson-Tawn model to reduce potential bias due 
to the partitions, but this seems difficult in our case because the number 
of terms added to the likelihood would be huge and the MCMC algorithm 
would become even slower. 

Our Bayesian model can be generalized in several ways. We focused on a 
stationary isotropic Brown-Resnick process for simplicity, and it appears to 
be sufficient for our application. A similar Bayesian hierarchical model can 
be constructed for the extremal-t process using ideas of Thibaud and Opitz 
(2015), anisotropy would be easily incorporated using the idea of Engelke 
et al. (2015), and so perhaps could the work of Huser and Genton (2016) 
on non-stationary dependence models for extremes. However, the general¬ 
ization of the present method to more complex models and larger datasets 
may be limited by computational feasibility. Monte Carlo estimation of the 
likelihood is critical to keep the running times reasonable, and is relatively 
fast and accurate in low dimensions, but with more sites, more simulations 
may be needed to obtain reliable inference. 

The model predicts an increase of 0.6°C per decade in the mean winter 
minimum temperatures in northern Fennoscandia. Further work to validate 
the predictions from our model and to quantify the possible effects of global 
change on boreal forests would be valuable. The linear trend in time pro¬ 
vides sensible short-term predictions, but for longer periods more complex 
approaches would be necessary, though limited by the availability of reliable 
data. One possibility would be to link the changes in extreme temperatures 
to mean temperatures and use predictions from global climate models under 
different scenarios as inputs to forecasting. 
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SUPPLEMENTARY MATERIAL 

Supplementary Material for Bayesian inference for the Brown 
Resnick process, with an application to extreme low temperatures 
(; .pdf). The online supplement provides additional results for the exploratory 
analysis and the MCMC algorithm used to fit the two Brown-Resnick mod¬ 
els, and reports additional diagnostics for the fit of the three models consid¬ 
ered in the manuscript. 
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